error value
Predicting Critical Heat Flux with Uncertainty Quantification and Domain Generalization Using Conditional Variational Autoencoders and Deep Neural Networks
Alsafadi, Farah, Furlong, Aidan, Wu, Xu
Deep generative models (DGMs) have proven to be powerful in generating realistic data samples. Their capability to learn the underlying distribution of a dataset enable them to generate synthetic data samples that closely resemble the original training dataset, thus addressing the challenge of data scarcity. In this work, we investigated the capabilities of DGMs by developing a conditional variational autoencoder (CVAE) model to augment the critical heat flux (CHF) measurement data that was used to generate the 2006 Groeneveld lookup table. To determine how this approach compared to traditional methods, a fine-tuned deep neural network (DNN) regression model was created and evaluated with the same dataset. Both the CVAE and DNN models achieved small mean absolute relative errors, with the CVAE model maintaining more favorable results. To quantify the uncertainty in the model's predictions, uncertainty quantification (UQ) was performed with repeated sampling of the CVAE model and ensembling of the DNN model. Following UQ, the DNN ensemble notably improved performance when compared to the baseline DNN model, while the CVAE model achieved similar results to its non-UQ results. The CVAE model was shown to have significantly less variability and a higher confidence after assessment of the prediction-wise relative standard deviations. Evaluating domain generalization, both models achieved small mean error values when predicting both inside and outside the training domain, with predictions outside the training domain showing slightly larger errors. Overall, the CVAE model was comparable to the DNN regression model in predicting CHF values but with better uncertainty behavior.
Mathematical Programming Algorithms for Convex Hull Approximation with a Hyperplane Budget
Barbato, Michele, Ceselli, Alberto, Messana, Rosario
We consider the following problem in computational geometry: given, in the d-dimensional real space, a set of points marked as positive and a set of points marked as negative, such that the convex hull of the positive set does not intersect the negative set, find K hyperplanes that separate, if possible, all the positive points from the negative ones. That is, we search for a convex polyhedron with at most K faces, containing all the positive points and no negative point. The problem is known in the literature for pure convex polyhedral approximation; our interest stems from its possible applications in constraint learning, where points are feasible or infeasible solutions of a Mixed Integer Program, and the K hyperplanes are linear constraints to be found. We cast the problem as an optimization one, minimizing the number of negative points inside the convex polyhedron, whenever exact separation cannot be achieved. We introduce models inspired by support vector machines and we design two mathematical programming formulations with binary variables. We exploit Dantzig-Wolfe decomposition to obtain extended formulations, and we devise column generation algorithms with ad-hoc pricing routines. We compare computing time and separation error values obtained by all our approaches on synthetic datasets, with number of points from hundreds up to a few thousands, showing our approaches to perform better than existing ones from the literature. Furthermore, we observe that key computational differences arise, depending on whether the budget K is sufficient to completely separate the positive points from the negative ones or not. On 8-dimensional instances (and over), existing convex hull algorithms become computational inapplicable, while our algorithms allow to identify good convex hull approximations in minutes of computation.
Decentralized State-Dependent Markov Chain Synthesis with an Application to Swarm Guidance
Uzun, Samet, Ure, Nazim Kemal, Acikmese, Behcet
This paper introduces a decentralized state-dependent Markov chain synthesis (DSMC) algorithm for finite-state Markov chains. We present a state-dependent consensus protocol that achieves exponential convergence under mild technical conditions, without relying on any connectivity assumptions regarding the dynamic network topology. Utilizing the proposed consensus protocol, we develop the DSMC algorithm, updating the Markov matrix based on the current state while ensuring the convergence conditions of the consensus protocol. This result establishes the desired steady-state distribution for the resulting Markov chain, ensuring exponential convergence from all initial distributions while adhering to transition constraints and minimizing state transitions. The DSMC's performance is demonstrated through a probabilistic swarm guidance example, which interprets the spatial distribution of a swarm comprising a large number of mobile agents as a probability distribution and utilizes the Markov chain to compute transition probabilities between states. Simulation results demonstrate faster convergence for the DSMC based algorithm when compared to the previous Markov chain based swarm guidance algorithms.
Auto-PINN: Understanding and Optimizing Physics-Informed Neural Architecture
Wang, Yicheng, Han, Xiaotian, Chang, Chia-Yuan, Zha, Daochen, Braga-Neto, Ulisses, Hu, Xia
Physics-informed neural networks (PINNs) are revolutionizing science and engineering practice by bringing together the power of deep learning to bear on scientific computation. In forward modeling problems, PINNs are meshless partial differential equation (PDE) solvers that can handle irregular, high-dimensional physical domains. Naturally, the neural architecture hyperparameters have a large impact on the efficiency and accuracy of the PINN solver. However, this remains an open and challenging problem because of the large search space and the difficulty of identifying a proper search objective for PDEs. Here, we propose Auto-PINN, the first systematic, automated hyperparameter optimization approach for PINNs, which employs Neural Architecture Search (NAS) techniques to PINN design. Auto-PINN avoids manually or exhaustively searching the hyperparameter space associated with PINNs. A comprehensive set of pre-experiments using standard PDE benchmarks allows us to probe the structure-performance relationship in PINNs. We find that the different hyperparameters can be decoupled, and that the training loss function of PINNs is a good search objective. Comparison experiments with baseline methods demonstrate that Auto-PINN produces neural architectures with superior stability and accuracy over alternative baselines.
Active-Learning-Driven Surrogate Modeling for Efficient Simulation of Parametric Nonlinear Systems
Kapadia, Harshit, Feng, Lihong, Benner, Peter
When repeated evaluations for varying parameter configurations of a high-fidelity physical model are required, surrogate modeling techniques based on model order reduction are desired. In absence of the governing equations describing the dynamics, we need to construct the parametric reduced-order surrogate model in a non-intrusive fashion. In this setting, the usual residual-based error estimate for optimal parameter sampling associated with the reduced basis method is not directly available. Our work provides a non-intrusive optimality criterion to efficiently populate the parameter snapshots, thereby, enabling us to effectively construct a parametric surrogate model. We consider separate parameter-specific proper orthogonal decomposition (POD) subspaces and propose an active-learning-driven surrogate model using kernel-based shallow neural networks, abbreviated as ActLearn-POD-KSNN surrogate model. To demonstrate the validity of our proposed ideas, we present numerical experiments using two physical models, namely Burgers' equation and shallow water equations. Both the models have mixed -- convective and diffusive -- effects within their respective parameter domains, with each of them dominating in certain regions. The proposed ActLearn-POD-KSNN surrogate model efficiently predicts the solution at new parameter locations, even for a setting with multiple interacting shock profiles.
Local Interpretability of Random Forests for Multi-Target Regression
Bardos, Avraam, Mylonas, Nikolaos, Mollas, Ioannis, Tsoumakas, Grigorios
Multi-target regression is useful in a plethora of applications. Although random forest models perform well in these tasks, they are often difficult to interpret. Interpretability is crucial in machine learning, especially when it can directly impact human well-being. Although model-agnostic techniques exist for multi-target regression, specific techniques tailored to random forest models are not available. To address this issue, we propose a technique that provides rule-based interpretations for instances made by a random forest model for multi-target regression, influenced by a recent model-specific technique for random forest interpretability. The proposed technique was evaluated through extensive experiments and shown to offer competitive interpretations compared to state-of-the-art techniques.
Decision-Focused Evaluation: Analyzing Performance of Deployed Restless Multi-Arm Bandits
Verma, Paritosh, Verma, Shresth, Mate, Aditya, Taneja, Aparna, Tambe, Milind
Restless multi-arm bandits (RMABs) is a popular decision-theoretic framework that has been used to model real-world sequential decision making problems in public health, wildlife conservation, communication systems, and beyond. Deployed RMAB systems typically operate in two stages: the first predicts the unknown parameters defining the RMAB instance, and the second employs an optimization algorithm to solve the constructed RMAB instance. In this work we provide and analyze the results from a first-of-its-kind deployment of an RMAB system in public health domain, aimed at improving maternal and child health. Our analysis is focused towards understanding the relationship between prediction accuracy and overall performance of deployed RMAB systems. This is crucial for determining the value of investing in improving predictive accuracy towards improving the final system performance, and is useful for diagnosing, monitoring deployed RMAB systems. Using real-world data from our deployed RMAB system, we demonstrate that an improvement in overall prediction accuracy may even be accompanied by a degradation in the performance of RMAB system -- a broad investment of resources to improve overall prediction accuracy may not yield expected results. Following this, we develop decision-focused evaluation metrics to evaluate the predictive component and show that it is better at explaining (both empirically and theoretically) the overall performance of a deployed RMAB system.
Transformer-Based Deep Learning Model for Stock Price Prediction: A Case Study on Bangladesh Stock Market
Muhammad, Tashreef, Aftab, Anika Bintee, Ahsan, Md. Mainul, Muhu, Maishameem Meherin, Ibrahim, Muhammad, Khan, Shahidul Islam, Alam, Mohammad Shafiul
In modern capital market the price of a stock is often considered to be highly volatile and unpredictable because of various social, financial, political and other dynamic factors. With calculated and thoughtful investment, stock market can ensure a handsome profit with minimal capital investment, while incorrect prediction can easily bring catastrophic financial loss to the investors. This paper introduces the application of a recently introduced machine learning model - the Transformer model, to predict the future price of stocks of Dhaka Stock Exchange (DSE), the leading stock exchange in Bangladesh. The transformer model has been widely leveraged for natural language processing and computer vision tasks, but, to the best of our knowledge, has never been used for stock price prediction task at DSE. Recently the introduction of time2vec encoding to represent the time series features has made it possible to employ the transformer model for the stock price prediction. This paper concentrates on the application of transformer-based model to predict the price movement of eight specific stocks listed in DSE based on their historical daily and weekly data. Our experiments demonstrate promising results and acceptable root mean squared error on most of the stocks.
Deep Transformer Model with Pre-Layer Normalization for COVID-19 Growth Prediction
Fitra, Rizki Ramadhan, Yudistira, Novanto, Mahmudy, Wayan Firdaus
Coronavirus disease or COVID-19 is an infectious disease caused by the SARS-CoV-2 virus. The first confirmed case caused by this virus was found at the end of December 2019 in Wuhan City, China. This case then spread throughout the world, including Indonesia. Therefore, the COVID-19 case was designated as a global pandemic by WHO. The growth of COVID-19 cases, especially in Indonesia, can be predicted using several approaches, such as the Deep Neural Network (DNN). One of the DNN models that can be used is Deep Transformer which can predict time series. The model is trained with several test scenarios to get the best model. The evaluation is finding the best hyperparameters. Then, further evaluation was carried out using the best hyperparameters setting of the number of prediction days, the optimizer, the number of features, and comparison with the former models of the Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN). All evaluations used metric of the Mean Absolute Percentage Error (MAPE). Based on the results of the evaluations, Deep Transformer produces the best results when using the Pre-Layer Normalization and predicting one day ahead with a MAPE value of 18.83. Furthermore, the model trained with the Adamax optimizer obtains the best performance among other tested optimizers. The performance of the Deep Transformer also exceeds other test models, which are LSTM and RNN.
Application of multilayer perceptron with data augmentation in nuclear physics
Bahtiyar, Hüseyin, Soydaner, Derya, Yüksel, Esra
Neural networks have become popular in many fields of science since they serve as promising, reliable and powerful tools. In this work, we study the effect of data augmentation on the predictive power of neural network models for nuclear physics data. We present two different data augmentation techniques, and we conduct a detailed analysis in terms of different depths, optimizers, activation functions and random seed values to show the success and robustness of the model. Using the experimental uncertainties for data augmentation for the first time, the size of the training data set is artificially boosted and the changes in the root-mean-square error between the model predictions on the test set and the experimental data are investigated. Our results show that the data augmentation decreases the prediction errors, stabilizes the model and prevents overfitting. The extrapolation capabilities of the MLP models are also tested for newly measured nuclei in AME2020 mass table, and it is shown that the predictions are significantly improved by using data augmentation.